359 research outputs found

    Fine-tuning Language Models for Factuality

    Full text link
    The fluency and creativity of large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations.' These errors can inadvertently spread misinformation or harmfully perpetuate misconceptions. Further, manual fact-checking of model responses is a time-consuming process, making human factuality labels expensive to acquire. In this work, we fine-tune language models to be more factual, without human labeling and targeting more open-ended generation settings than past work. We leverage two key recent innovations in NLP to do so. First, several recent works have proposed methods for judging the factuality of open-ended text by measuring consistency with an external knowledge base or simply a large model's confidence scores. Second, the direct preference optimization algorithm enables straightforward fine-tuning of language models on objectives other than supervised imitation, using a preference ranking over possible model responses. We show that learning from automatically generated factuality preference rankings, generated either through existing retrieval systems or our novel retrieval-free approach, significantly improves the factuality (percent of generated claims that are correct) of Llama-2 on held-out topics compared with RLHF or decoding strategies targeted at factuality. At 7B scale, compared to Llama-2-chat, we observe 58% and 40% reduction in factual error rate when generating biographies and answering medical questions, respectively

    Search and Rescue under the Forest Canopy using Multiple UAVs

    Full text link
    We present a multi-robot system for GPS-denied search and rescue under the forest canopy. Forests are particularly challenging environments for collaborative exploration and mapping, in large part due to the existence of severe perceptual aliasing which hinders reliable loop closure detection for mutual localization and map fusion. Our proposed system features unmanned aerial vehicles (UAVs) that perform onboard sensing, estimation, and planning. When communication is available, each UAV transmits compressed tree-based submaps to a central ground station for collaborative simultaneous localization and mapping (CSLAM). To overcome high measurement noise and perceptual aliasing, we use the local configuration of a group of trees as a distinctive feature for robust loop closure detection. Furthermore, we propose a novel procedure based on cycle consistent multiway matching to recover from incorrect pairwise data associations. The returned global data association is guaranteed to be cycle consistent, and is shown to improve both precision and recall compared to the input pairwise associations. The proposed multi-UAV system is validated both in simulation and during real-world collaborative exploration missions at NASA Langley Research Center.Comment: IJRR revisio

    Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback

    Full text link
    A trustworthy real-world prediction system should be well-calibrated; that is, its confidence in an answer is indicative of the likelihood that the answer is correct, enabling deferral to a more expensive expert in cases of low-confidence predictions. While recent studies have shown that unsupervised pre-training produces large language models (LMs) that are remarkably well-calibrated, the most widely-used LMs in practice are fine-tuned with reinforcement learning with human feedback (RLHF-LMs) after the initial unsupervised pre-training stage, and results are mixed as to whether these models preserve the well-calibratedness of their ancestors. In this paper, we conduct a broad evaluation of computationally feasible methods for extracting confidence scores from LLMs fine-tuned with RLHF. We find that with the right prompting strategy, RLHF-LMs verbalize probabilities that are much better calibrated than the model's conditional probabilities, enabling fairly well-calibrated predictions. Through a combination of prompting strategy and temperature scaling, we find that we can reduce the expected calibration error of RLHF-LMs by over 50%

    Dickkopf-related protein 1 (Dkk1) regulates the accumulation and function of myeloid derived suppressor cells in cancer

    Get PDF
    Tumor–stroma interactions contribute to tumorigenesis. Tumor cells can educate the stroma at primary and distant sites to facilitate the recruitment of heterogeneous populations of immature myeloid cells, known as myeloid-derived suppressor cells (MDSCs). MDSCs suppress T cell responses and promote tumor proliferation. One outstanding question is how the local and distant stroma modulate MDSCs during tumor progression. Down-regulation of β-catenin is critical for MDSC accumulation and immune suppressive functions in mice and humans. Here, we demonstrate that stroma-derived Dickkopf-1 (Dkk1) targets β-catenin in MDSCs, thus exerting immune suppressive effects during tumor progression. Mice bearing extraskeletal tumors show significantly elevated levels of Dkk1 in bone microenvironment relative to tumor site. Strikingly, Dkk1 neutralization decreases tumor growth and MDSC numbers by rescuing β-catenin in these cells and restores T cell recruitment at the tumor site. Recombinant Dkk1 suppresses β-catenin target genes in MDSCs from mice and humans and anti-Dkk1 loses its antitumor effects in mice lacking β-catenin in myeloid cells or after depletion of MDSCs, demonstrating that Dkk1 directly targets MDSCs. Furthermore, we find a correlation between CD15(+) myeloid cells and Dkk1 in pancreatic cancer patients. We establish a novel immunomodulatory role for Dkk1 in regulating tumor-induced immune suppression via targeting β-catenin in MDSCs

    Prediction of cardiovascular outcomes with machine learning techniques: application to the Cardiovascular Outcomes in Renal Atherosclerotic Lesions (CORAL) study.

    Get PDF
    Background: Data derived from the Cardiovascular Outcomes in Renal Atherosclerotic Lesions (CORAL) study were analyzed in an effort to employ machine learning methods to predict the composite endpoint described in the original study. Methods: We identified 573 CORAL subjects with complete baseline data and the presence or absence of a composite endpoint for the study. These data were subjected to several models including a generalized linear (logistic-linear) model, support vector machine, decision tree, feed-forward neural network, and random forest, in an effort to attempt to predict the composite endpoint. The subjects were arbitrarily divided into training and testing subsets according to an 80%:20% distribution with various seeds. Prediction models were optimized within the CARET package of R. Results: The best performance of the different machine learning techniques was that of the random forest method which yielded a receiver operator curve (ROC) area of 68.1%±4.2% (mean ± SD) on the testing subset with ten different seed values used to separate training and testing subsets. The four most important variables in the random forest method were SBP, serum creatinine, glycosylated hemoglobin, and DBP. Each of these variables was also important in at least some of the other methods. The treatment assignment group was not consistently an important determinant in any of the models. Conclusion: Prediction of a composite cardiovascular outcome was difficult in the CORAL population, even when employing machine learning methods. Assignment to either the stenting or best medical therapy group did not serve as an important predictor of composite outcome. Clinical Trial Registration: ClinicalTrials.gov, NCT00081731
    • …
    corecore